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THE PROBLEM 


Develop statistical, physical, and computer techniques and 
methods for interpreting, summarizing, and extrapolating environ- 
mental data to support Navy requirements in research, develop- 
mental, and operational aspects of underwater detection, location, 
communications, and navigation. Specifically, study the use of 
regression models for time/space interpolation of sea-surface 
temperature observations, 
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RESULTS 


1. A regression model considering latitude, longitude, and day- 
of-year as the independent variables, together with empirically 
determined interaction terms, was found capable of estimating the 
seasonal variation of sea-surface temperature off the west coast 
of the United States, in water depths greater than 100 fathoms, to 
a standard deviation of less than 1°F. 


2, The analysis suggests that more information than previously 
suspected can be obtained from a given number of observations 
provided realistic regression models can be developed. This 
suggestion has important implications with regard to sampling. A 
sampling interval based on the model can be used in place of the 
fixed time interval employed in the classical manner with an area 
grid. The oceanographic problem becomes one of searching for 
adequate models. It is indicated adequate models can be derived 
for many ocean areas from the present archive of oceanic temper- 
ature data. 


3. On the assumption that the regression model is reasonably 
valid, the regression technique has the potential of being an effec- 
tive method for identifying and editing raw temperature data for 
erroneous observations and for detecting and isolating temperature 
anomalies. 


4, This study suggests that regression techniques may provide 
the basis for a new approach to summarizing archived sea-surface 
temperature data. 


RECOMMENDATIONS 


1. Extend regression models to include depth as an independent 
variable, 


2. Apply regression modeling techniques to describing the dis- 
tribution of other oceanic parameters, such as salinity. 
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AUTHOR’S NOTE 


This study was undertaken as a result of the following 
recommendation made in an NEL report published in 1960: 1,* 


"Explore the utility of multiple regression 
(response surfaces), or more complex 
analyses of variance, in summarizing the 
entire body of collected data, since the 
manner in which the observations are col- 
lected limits the amount of the available 
data which can be used for comparisons at 
different places and times." 


This recommendation resulted from a review of the manu- 
script of NEL Report 965 by Dr. George W. Snedecor, then Con- 
sultant in Statistics at NEL. The initial attempts to apply regres- 
sion models to summarizing sea-surface temperature data were 
undertaken jointly by Dr. Snedecor and the author. The original 
intent was to publish these results as a coauthored report. With 
the retirement of Dr. Snedecor a number of years ago this is not 
now possible. However, the author wishes to especially acknowl- 
edge Dr. Snedecor's enthusiastic motivation, interest, and con- 
tribution to the results presented in this report. 


* 
See references at end of report 


Oceanometrics Defined 


The work described in this paper is often referred to as the 
"climatology of the oceans."' Since the dictionary definition of 
"climatology" refers to the atmosphere only, Dr. Snedecor pro- 
posed that this aspect of oceanography be called 'oceanometrics."' 
The use of such a word has precedents in the fields of biology and 
economics, where "biometrics" and ''econometrics" are well- 
defined words. 

Since Dr. Snedecor first proposed the word some five years 
ago, its definition has been undergoing gradual evolution. Origi- 
nally it was felt that the word implied a relationship to oceanogra- 
phy similar to the relationship of climatology to meteorology. In 
the minds of many people climatology is associated with the statis- 
tical summarization of measurements of atmospheric parameters, 
such as temperature, wind speed, and the like, with no implication 
concerning its ultimate application. 

As a result a second definition evolved suggesting that 
oceanometrics occupied a position between the extremes of "pure 
dynamical oceanography" and "climatology of the ocean." In this 
sense pure dynamical oceanography is thought of as attempting to 
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construct simplified models and, from these models, to derive 
laws that describe what is happening in the ocean; and climatology 
of the oceans is thought of as collecting data on oceanographic 
parameters and presenting statistical summaries of these data 
with, in the extreme case, no thought to physical theory. 

Recently a third definition has been suggested based on the 
assumption that most sciences develop in three stages: descrip- 
tion, prediction, and control. At the present time the science of 
oceanography is phasing from description to prediction. The first 
stage, usually referred to as "descriptive oceanography," is 
primarily concerned with the reporting of data collected during 
exploratory data-collection cruises. The second stage, prediction, 
is primarily concerned with the quantitative analysis of oceanic 
data. This stage involves the symbolic expression of the physics 
of ocean behavior, or mathematical modeling. It is generally 
referred to as "dynamic oceanography" if it is limited to determin- 
istic variables—that is, variables which are unencumbered by 
random variability. It is proposed that the second stage be 
referred to as ''oceanometrics" if it is concerned with stochastic 
variables—that is, variables which include random variability. 


ea 
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INTRODUCTION 


This study is the second in a series concerned with the 
analysis of sea-surface temperature observations. The first 
study ~ dealt with the effect of missing data in long time-series 
sea-surface temperature measurements on certain regression and 
autocorrelation analyses. 

For many decades observations of sea-surface temperature 
have been taken and recorded by merchant and naval vessels. 
Subsequently these observations have been catalogued and archived 
by many agencies. In the United States these agencies include the 
U. S. Naval Hydrographic Office (now the U. S, Naval Oceano- 
graphic Office), the National Weather Records Center, and the 
National Oceanographic Data Center. 

As the volume of data accumulated, it became the basis for 
many generalized summaries.*!” All these summaries use 
arbitrary temporal and spatial averaging. Krummel averaged the 
data over all years and all months. His areas were 5 degrees of 
latitude in the north-to-south dimension and extended east to west 
across an entire ocean. Bohnecke, the U. S. Weather Bureau, and 
the U. S. Naval Hydrographic Office used areas of 1-, 2-, or 
5-degree squares of latitude and longitude and averaged all years 
together by month or season. 

In recent years a requirement by researchers in the fields 
of fisheries oceanography, military oceanography, and meteor- 
ological oceanography for more detailed descriptions of sea- 
surface temperature distributions has developed. In response to 
this requirement the Bureau of Commercial Fisheries and the 
American Geographical Society have begun the preparation of 
more detailed charts. The Bureau of Commercial Fisheries is 
preparing detailed month-by-month charts of sea-surface 
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temperature in the North Pacific and the American Geographical 
Society is preparing similar charts for the Atlantic in the area of 
the Gulf Stream. The technique used to summarize the data is 
the same as that used in the earlier studies. Both studies sum- 
marize data for a particular year by monthly time intervals. The 
Bureau of Commercial Fisheries uses 2-degree-square areas and 
the Geographical Society uses 30-minute-square areas. 

This study examines the potential of multiple-regression 
analysis as an approach to analyzing sea-surface temperature 
observations. Although multiple-regression techniques were 
developed by the statisticians many decades ago, they have rarely 
been used by oceanographers, because of the complexity inherent 
in developing realistic models and the magnitude of the arithmetic 
task required to evaluate the necessary constants, With the rapid 
progress in developing high-speed digital computers, the arith- 
metic computational difficulties have been overcome to the extent 
that it is now practical to consider relatively complex models. 


THE REGRESSION MODEL 


In the use of regression analysis it is necessary to know, or 
to assume, (1) the major independent variables, or main effects: 
and (2) a functional relationship between these variables, or a 
regression model, In any given situation the desired functional 
relationship is generally determined from analytical or theoretical 
considerations or from a study of scatter diagrams prepared from 
the data being analyzed. In this study the latter approach is used. 


The assumed independent variables are latitude, longitude, 
and day-of-year. A "point," or ''cell,' in the model is a 10- 
minute-by-10-minute area for a 1-day time period. A 10-minute- 
square area was selected, since the location of the data points is 
probably not known exactly and the initial interest is in the sea- 
sonal, or day-to-day, change in temperature. 


Seasonal Variation 


Sea-surface temperature records acquired over approx- 
imately 96 years were used to establish a functional relationship 
descriptive of the seasonal variation. The data were for the fol- 
lowing locations: 


50°.N 145 W Pacific Ocean Weather Ship PAPA 

51°50'N 131 W Pacific Ocean St. James Island 

54°10'N 133° W Pacific Ocean Langara Island 

32°50'N 117 15'W Pacific Ocean Scripps Pier, La Jolla, Calif. 
35 N 48 W Atlantic Ocean Weather Ship ECHO 


One year of measurement for each location is presented in 
figure 1 to give the reader some feel for how the individual tem- 
perature measurements vary throughout the year. A subjective 
examination of the data shows a more or less regular sinusoidal 
variation with season. In addition there are variations of a few 
days' to a few weeks' duration at irregular intervals. At the open- 
ocean locations, PAPA and ECHO, the shorter-period variations 
occur less frequently and their magnitude is smaller than at the 
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Figure 1. An example of the day-to-day variation of sea-surface temperature for five 
locations in the eastern Pacific Ocean and the Atlantic Ocean. 
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Figure 1 (Continued) 
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coastal location, Scripps Pier. It is also recognized that there is 
a diurnal variability present in these data since the daily observa- 
tions were taken at random times during the day. The latter 
short-period variations will not be included in the model, with the 
result that their effect will contribute to the unexplained variance. 

Since the seasonal variation in sea-surface temperature is 
not symmetrical about an origin, the use of the following fifth- 
degree polynomial is suggested by the scatter diagrams: 


T =d,-GD -a,D- ~a,D? ~a,D° “a_D? (1) 
where ) is measured in days from some arbitrary origin, T is 
the least-squares fitted value, or estimate, of surface tempera- 
ture, and the sub scripted a's are regression coefficients to be 
estimated. 

Equation (1) was fitted to 5 years of data taken at each of the 
five locations listed above to demonstrate the adequacy of the 
fifth-degree polynomial as an estimator of the seasonal sea- 
surface temperature variation, The origin of time was taken as 
July 1 and the years referred to are fiscal years. The notation 
"1954" refers to a fiscal year 1954 starting 1 July 1953, and end- 
ing 30 June 1954, 

The following related quantities are used as measures of the 
"goodness of fit'’ of equation (1) to the observed data: R , multiple 
correlation coefficient: 100R2 percent variance explained by 
regression: and «, standard deviation in degrees Fahrenheit of the 
observations about the regression curve. 
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Fiscal year 95 i 
Weather ship PAPA 


Observations 254 
R 0.98 
100R* 95.9 
o Ilgal 


Weather ship ECHO 


Observations 229 
R 0. 96 
100R" 92.3 
o 1.2 


Cape St, James 


Observations 325 
R 0.95 
100R? 90.5 
o Ibe 


100 
0.98 


96.4 


Fiscal year 1957 1958 1959 1960 1961 


Langara Island 


Observations 345 354 343 343 306 
R 0.96 0.95 0.91 0.98 0.96 
100R7 92.4 90.8 83.1 95.9 91.3 


Scripps: Pier 


Observations 362 357 364 359 358 
R 0.96 0.93 0.91 0.91 0.94 
100k? 91.6 86.2 83.0 83.4 87.9 
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An examination of these statistics supports the conclusion 
that a fifth-degree polynomial is an acceptable estimator of the 
seasonal variation. Relatively, the fit is best for open-ocean data 
(PAPA and ECHO), with 92 to 98 percent of the observed varia- 
bility explained; next best at island locations (St. James Island and 
Langara Island), with 83 to 96 percent of the variability explained; 
and poorest at Scripps Pier, located about 1000 feet from shore, 
with 83 to 92 percent of the variability explained. Equation (1) 
has been fitted to many other years of data with similar results. 

On figure 1 the solid line is a plot of equation (1) using the 
regression coefficients for the indicated location and year. The 
histograms show the distribution of differences between the 
observed and estimated sea-surface temperatures and the vertical 
dotted lines indicate one standard deviation of these differences, 

It is of interest to note that from September 1 to January 1 
the rate of cooling varies from 2. OF per 30-day period at St. 
James Island to 3.4 F per 30-day period at PAPA. The rate of 
warming from May 1 to July 1 varies from 3. 2°F per 30-day 
period at Langara Island to 4, 0F per 30-day period at ECHO and 
St. James Island. 


Latitudinal Variation 


The observations used to examine the latitudinal variation in 
sea-surface temperature were taken from Punched Card Deck 116, 
U. S. Merchant Marine and Other Ship Observations, 1949 ——— , 
of the National Weather Records Center, These are marine 
weather observations which include, among other parameters 
measured, a sea-water temperature. These observations are 
usually taken by a mercury-in-glass thermometer installed in 
the ship's sea-water-intake system and are reported to the nearest 
whole degree Fahrenheit,!? Listings of all weather observations 
made in the North Pacific north of 20 N for the years 1956 and 
1957 were obtained from the Records Center. 

From the punched-card deck the sea-surface temperatures 
taken north of 20°N and along a given longitude, +0. 2 of longitude, 
were selected to examine the variation of surface temperature as 
a function of latitude. The longitude strips selected were: 126°, 
129°, 132°, 135°, 138°, and 141°W. The data from three of these 
strips for March and September 1956 and 1957 are plotted in 
figure 2. March and September were used to minimize the effect 
of the seasonal change in temperature. 
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22 Figure 2. Latitudinal variation of sea-surface temperature. 
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24 Figure 2 (Continued) 
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A subjective study of these scatter diagrams suggests a 
polynomial of the following form: 


ce 2 3 
T -a, +@.L+a.L +a.L (2) 


where L is the latitude, T is the estimated sea-surface tempera- 
ture, and the subscripted a's are the regression coefficients to 
be estimated, 

Equation (2) was fitted to the above 24 sets of data. The 
following statistics for the data shown in figure 2 were obtained: 


Number of 100 Re oO, 
Longitude | Observations percent | degrees F 

MARCH 

1956 | 126 W 1.5 

132° 16 

138 eal 

1957| 126 W Qe 

132% 23 

138 its 8) 


Number of 
Year | Longitude | Observations 


SEPTEMBER 


1956 126 W 


1957 126° W 


Relatively the fit is best for data taken the greatest distance 
from shore, the percent variance explained by regression varying 
systematically from 94 to 71 percent. The reader is reminded 
that the original data were reported to the nearest whole degree 
Fahrenheit and that many of the temperatures are 'injection" 
temperatures taken at some depth below the surface. The standard 
deviation would be expected to be greater for these data than for 
the data used to establish the seasonal variation. 

On figure 2 the solid line is a plot of equation (2) using the 
regression coefficients for the proper latitude, year, and month. 

A third-degree polynomial appears to exhibit the flexibility neces- 
sary to obtain a reasonable estimation of the latitudinal variation. 

It is of interest to note that from 30 to 40 N the temperature 
decreases about 1,2°F per 1-degree change in latitude, about 60 
nautical miles. 


Longitudinal Variation 


The observations used to examine the longitudinal variation 
were also obtained from Punched Card Deck 116. 

From this deck the sea-surface temperatures taken along a 
given latitude, +0. 2° of latitude, were selected, Latitude strips, 
at 3-degree intervals, were selected starting at 30 N. The data 
for four of these strips—30- : SG 42 om and 48° N—for March and 
September 1956 and 1957 are plotted on figure 3. 

The data suggest that a third-degree polynomial can be used 
as a model where 


T’= a,’ + a,G + a0 i a," @) 
and G is the longitude. 

Equation (3) was fitted to the above sets of data. The fol- 
lowing statistics for the data shown on figure 3 were obtained: 


Number of 
Observations 


SEPTEMBER 


1956 30 N iL 
36. 2m? 
42 2.3 
48 2.8 
1957 30 N 2.0 
36 2.5 
42° 3.0 
48 1.4 
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Figure 3. Longitudinal variation of sea-surface temperature, 
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The variability in the R and the related 100R7is considerably 
more in longitude than in the seasonal or latitudinal analysis, 
while the variability in the standard deviation is about the same in 
longitude and in the latitudinal analysis. The variability in the R 
and 100R7is related to the fact that for the samples with the 
smaller R and 100R* there is little change in temperature with 


longitude. In other words, there is not much systematic variability 


to explain. For the samples with the larger R and 100R? values, 
the systematic longitudinal variation is relatively greater. Thus, 


these statistics are not comparable, since they are not independent, 


for any given sample, of the overall change in temperature with 
longitude. On the other hand the standard deviation, which is a 
measure of the random variability in the variation of the tempera- 
ture, may be compared from sample to sample. 

It is concluded that in the geographical area under consider- 
ation the surface temperature is less sensitive to longitudinal 
change than to latitudinal and seasonal change, that there are 
considerable differences between latitudinal strips, and that equa- 
tion (3) is flexible enough to describe these differences. 


Interactions 


An examination of the data in figure 2 shows that the tem- 
perature variation with latitude differs from one longitude to 
another, indicating that there are interactions between latitude 
and longitude. In addition, there are indications that there are 
interactions between latitude and day, and longitude and day. It 
is necessary, therefore, to include such interactions in the model 
if the model is to be realistic. 

A trial-and-error approach was used to obtain information 
on the characteristics of the interactions to be included in the 
model, Many combinations of the main-effect terms were tried 
and discarded. The following terms appear to be the most 
important for the area under consideration: 


a, LD + a, ,LD® + a, LD? latitude by day 


a, _GD+a GD® Lol (DY longitude by day 
15 16 17 
LGs(anGena GE! 
a ghG + (a, G> +4, ,GL + 
latitude by longitude 


2 3\pn 
(aL + a, oL IG 
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Summary 


Equations (1). (2), and (3) together with the above interaction 
terms were combined to form a 22-variable regression model with 
23 coefficients to be determined by least squares: 

Surface temperature = 


2 3 4 5 Pe aad 
a) + a,D +a,D aD - a,D - a-D + day-of-year 
9 ‘ main 
a,b + a,L* + ania? + latitude effects 
G.G pay Gena Ge + longitude 
9 107 11 
3 3 3 
a + + su e | ay (4) 
19LD a,,LD a, ,LD latitude by day 
a__GD+a -GD° +a._GD° + longitude by day 
15 16 17 
intei- 
a, gLG a actions 
2 3 : 9 
2 ieee : J a 
(a, 4G a, 9G L latitude by longitude 


2 3 
(a,b + @,5L IG 


where D, L, and Gare the day-of-year, latitude, and longitude, 
respectively, This model is applicable to an area off the west 
coast of the United States extending from 20 N to 58 N and from 
the coast to 150° W. 


RESULTS OF REGRESSION ANALYSIS 


Equation (4), or a modification of it, was fitted to surface 
temperature observations taken by a bathythermograph in the 
areas A, B, C, and E shown in figure 4. In addition it was fitted 
to the large area shown extending from 30 to 49° N and seaward 
about 650 miles. In the latter area the sea-surface temperature 
observations used were made in the four shaded 1-degree-longitude 
strips (B, C, D, and E) shown in figure 4. The measurements 
were treated as a single sample drawn from the large area. 

Equation (4) was developed as a model for the largest area. 
Since the other areas cover smaller intervals of latitude and 
longitude, the terms in equation (4) that involve the higher orders 
of these variables are omitted. A 3-month overlap in time was 
used in making the least-squares fit to control the behavior of the 
fifth-degree polynomial. Thus, the data used covered an 18-month 
period extending from 1 April of a given year to 1 October of the 
following year, and the resulting equation was used to estimate 
the temperature during the included fiscal year. 
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Figure 4. Area locator chart. 
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Figure 5 is an example of how the data are distributed with 
respect to space and time. Data taken in waters less than 100 
fathoms in depth were excluded from the analysis. The data 
distributions for areas B, C, and E for the years 1952, 1953, and 
1954 are included. The histograms on page 40 show the monthly 
distributions and the figures on page 41 the geographical distribu- 
tions. The nonuniform data distribution with respect to time and 
space is obvious. Geographically, most of the data are in the 
area nearest shore. The number of data decreases rapidly to the 
west, many 10-minute-square areas containing no data. Area C 
is notable for its nonuniform spatial distribution. Temporally, 
most of the data were taken during the spring and summer. Area 
E is notable for its nonuniform temporal distribution. 

The distributions of the observations in the other data sets 
used in this study exhibit similar characteristics. 

Equation (4) was least-squares fitted to 5 years of data taken 
in Area A from 1951 to 1955 inclusive and in areas B, C, and E 
from 1950 to 1954 inclusive. Each data set consisted of sea- 
surface temperatures as recorded in degrees Fahrenheit on 
bathythermograms taken during the indicated year and in the 
indicated area. The results of the 20 individual regression 
analyses are presented in figure 6. For each analysis the number 
of observations, the multiple correlation coefficient, the percent 
of the variance explained by regression, and the standard devia- 
tion of the observations about regression are shown. The last 
analysis utilized the data taken in the four 1-degree-latitude 
strips. As indicated in figure 7 these measurements were treated 
as a single sample drawn from the area 30 to 49 N and extending 
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40 Figure 5. Space/time distribution of sea-surface temperature observations in Areas B, C, and E 
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Figure 7. Location of data used for large-area regression model. 
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seaward about 650 nautical miles and for an 18-month time period 
extending from 1 April 1949 to 1 October 1950. The total number 
of observations in each area and their temporal and spatial dis- 
tribution are also shown. In the shaded areas one to 20 observa- 
tions were made and in the unshaded areas no observations were 
made. Equation (4) was fitted to the data. The statistical results 
were as follows: 


Number of observations: Area B 239 
Area C 190 
Area D 176 
Area E 203 
Total 808 


100R2 percent variance explained by regression 85.7 
R, multiple correlation coefficient 0.93 


o, standard deviation in degrees Fahrenheit 
of the observations about regression 1.9 


Figure 8 shows the location in time and space of 971 temper- 
ature observations made in this area during fiscal year 1950. The 
observations were not used in obtaining the regression equation 
but were used as a control to see how well the regression equation 
could estimate independently observed sea-surface temperatures. 
The difference between the observed temperature and that esti- 
mated from regression was obtained. The results are summarized 
in figure 9. The standard deviation of the differences was 2, 3°F 
compared to 1. 9 F for the regression equation. 
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Figure 8. Time/’space distribution of data used for sea-surface temperature comparisons. 
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Regression analysis appears to have considerable potential 
as a technique for estimating sea-surface temperatures. However, 
the physical reality of the estimates must also be considered, 
since it is always possible to improve the statistical measures of 
goodness of a regression-model estimate by merely adding addi- 
tional terms to the model. From a physical viewpoint these terms 
may be nonsense terms. 

Regression analyses of Area B (1954), Area C (1953), and 
Area E (1952) are examined to illustrate the physical reality of 
the model, The difference between observed values of tempera- 
ture and the values obtained from regression will be considered as 
a function of water depth, time, latitude, and longitude. 

It was noted in studying the results of some of the earlier 
analyses that many of the large differences between the tempera- 
tures obtained from regression and those obtained by observation 
occurred in the shallower water adjacent to the coast line. This 
finding was not surprising, since transient and local effects, which 
are not accounted for in the model, should have their maximum 
influence on the temperature in such areas. 

Figures 10 and 11 present a qualitative histogram analysis 
of the effect of water depth for the three regressions. Figure 10, 
for each area, contains three histograms. The shaded portion of 
the histogram on the left shows the distribution of differences for 
data taken in water depths less than 100 fathoms, while the un- 
shaded histograms include all data used in the regression analysis, 
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The center histograms show the distribution of differences for 
observations taken in water depths greater than 100 fathoms and 
the histograms on the right show the differences for observations 
taken in water depths less than 100 fathoms. 

Figure 11 contains histograms of the differences between 
observation and regression as a function of the percent of total 
observations taken in water of less than 100 fathoms. 

An examination of the histograms, particularly those of 
figure 11, leads to the not unexpected qualitative conclusion that 
differences between regression and observation for the observa- 
tions taken in water depths less than 100 fathoms are greater than 
for those taken in deeper water. The conclusion suggests that 
variables other than latitude, longitude, and day-of-year are im- 
portant in determining the distribution of temperature in shallower 
water. 

Figure 12 shows the differences between the observed tem- 
perature and that computed from regression as a function of day- 
of-year for each of the three areas. In addition a two-standard- 
deviation interval is shown on the right. From a seasonal point of 
view the differences seem to be randomly distributed about zero 
difference. However, for time periods of a few days to tens of 
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Figure 11. Differences between observed and computed sea-surface temperatures as a 
function of percent of observations made in water depths less than 100 fathoms. 
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function of day-of-year. 
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days, the differences are not randomly distributed. As examples, 
five such time periods are noted on figure 12. It is noted that 
period four persisted for only a few days while period five appears 
to have persisted for a month or more. It is concluded that equa- 
tion (4) does estimate the seasonal variation, but does not, as 
expected, estimate the shorter-period temperature variations 
related to short-period transient phenomena. 

Figure 13 shows the differences as a function of 10-minute- 
latitude intervals. These differences appear to be randomly 
distributed about zero difference. The numbered short-time- 
period samples shown on figure 12 are also indicated on this 
figure, giving an indication as to the latitudinal extent of these 
short-period anomalies. 

Figure 14 shows the differences as a function of 10-minute 
longitudinal intervals. Again the differences appear to be ran- 
domly distributed about zero difference. Also the numbered short- 
period samples are shown, giving an indication of the longitudinal 
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Figure 13. Differences between 
observed and computed sea-surface 
temperature as a function of latitude. 
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extent of the short-period anomalies. The spatial extent of the 
anomalies, as shown by figures 13 and 14, may cover 1 degree of 
latitude and several degrees of longitude. For example, numbered 
sample five shows that the observed temperature was lower than 
the regression estimate over the 1 degree of latitude and over 
approximately 4 degrees of longitude for a time period of about 1 
month. 

For most of the numbered samples the data are located in 
that part of the latitude strip nearest the coast, where, oceano- 
graphically, the variation in temperature is expected to be most 
erratic, because the number of mechanisms there that affect 
temperature is greatest. 

Figures 15 through 17 are graphical representations of the 
regression models for these samples. In each figure the regres- 
sion equation is shown at the top. The narrow strip at the bottom 
shows the geographical distribution of the observations. The 
graphs above the strip show the variation of sea-surface tempera- 


ture as a function of day-of-year for each of the 10-minute-by- 
10-minute shaded areas. The dots are measured temperatures 
within the area. To the right of the strip is a histogram showing 
the distribution of the observed data sample by months; to the 
right of the histogram the variation of temperature with longitude 
for the first day of the month is shown; and to the far right is 
shown the distribution of the differences between observed and 
estimated temperature. Pertinent statistics are given at the top 
of each figure. 

A qualitative study of these figures does not reveal any 
contradictions to generally accepted characteristics of the varia- 
tion of sea-surface temperature in the areas covered by the 
analyses. 

It is concluded that the regression model, as expressed in 
equation (4), is a physically acceptable estimator of seasonal and 
spatial variations in sea-surface temperature in the areas covered 
by these analyses. 
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Figure 15. Graphical representation of the regression model for the 30° to 31° latitudinal strip. 
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Figure 16. Graphical representation of the regression model for the 36° to 37° latitudinal strip. 


62 


FISCAL YEAR 1953 


NUMBER OBSERVATIONS 
PERCENT VARIANCE EXPLAINED 
MULTIPLE CORRELATION COEFFICIENT 


STANDARD DEVIATION 


STATISTICS 


LONGITUDE, DEGREES W 


63 


T =a4+ a,D i ase + a,D° + a ,D4 + a,D° +t DAY-OF-YEAR 


Ss 0 
a, CESS, a,L 34 LATITUDE 
0,6 +a,,674 4,6? + LONGITUDE 
a,,LD+ LATITUDE x DAY-OF-YEAR 
a,,GD + a,,GD° 4 a,,GD° i LONGITUDE x DAY-OF-YEAR 


2 3 LATITUDE x LONGITUDE 
a,,LG4 a,,LG 4 9,)LG 


LONGITUDE, DEGREES W 


Figure 17. Graphical representation of the regression model for the 48° to 49° latitudinal strip. 
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DISCUSSION 


Several interesting observations are suggested by a study of 
the 21 regression analyses considered above. 


Time /Space Consistency 


Perhaps the most obvious is the consistency in the statistical 
parameters with respect to both year and area. In any analysis 
technique this consistency is important. If the results from year 
to year and area to area varied widely, then the statistical model 
would have little predictive potential. It is noted that: (1) The 
multiple-regression correlation coefficients vary from 0. 84 to 
0.99, with 50 percent of the coefficients in the 0. 88-to-0, 91- 
percent interval. (2) The percent of variance explained by regres- 
sion varied from 71 to 97 percent, with 50 percent in the ‘79-to-86- 
percent interval. (3) The standard deviations varied from 0.9 to 
alle 9°F, with over 50 percent in the 1. 1-to-1. 5 F interval. The 
median standard deviation was 1.2 F. Since the surface- 
temperature data were obtained from bathythermograms, the data 
have an instrumental error from 0.5. to perhaps 1. OEP the 
meegnitude selected depending upon the reader's personal feeling 
regarding bathythermogram accuracy. The instrumental error 
represents the noise in the data. The difference between the 
instrumental error and the standard deviation, about 0.5 to 1. 0 F, 
could possibly be a systematic variation not considered in the 
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model. The model does not include the diurnal variation, a 
systematic variation of about this magnitude. It is anticipated 

that the inclusion of this variable would decrease the standard 
deviation to a value near the instrumental error. Thus, it appears 
reasonable to suggest that a simple statistical model, such as 
equation (4), using bathythermogram data, will describe the 
seasonal and spatial variation of sea-surface temperatures to one 
standard deviation of something less than ie 


Time /Space Distribution 


The temporal and spatial distribution of the observations is 
of interest. A study of the distribution of the data (fig. 5) sug- 
gests that temporally the distribution is the most unsatisfactory in 
Area E, since no observations were made from October to March; 
and that spatially it is most unsatisfactory in Area C, since in one 
10-minute square, near the 100-fathom contour, 32, 75, and 87 
observations were taken in 1952, 1953, and 1954, respectively. 
The observations represent 12, 30, and 30 percent of the total 
data taken in their respective years. 

An examination of the variation of the statistical measures 
shown in figure 6 suggests that the data distributions are not 
unsatisfactory, as originally thought, but are quite satisfactory. 
An examination of the regression model supports this contention. 
The model for the day-to-day variation is a fifth-degree polynom- 
ial. If this model truly represents the seasonal variation of sur- 
face temperature, then it is necessary only to have observations 


during oceanographic summer and winter, since in order for the 
model to fit the data taken during the seasonal extremes, it must, 
by nature of the model, fit the data taken during the periods of 
spring warming and autumn cooling. Thus, the taking of addi- 
tional data during the latter seasons neither adds to nor detracts 
from the results obtained. Similar reasoning applies to the spa- 
tial distribution of data. If a third-degree polynomial describes 
the longitudinal variation of surface temperature, then it is neces- 
sary only to have a few observations distributed over the area to 
determine the shape of the polynomial. Again, the taking of addi- 
tional observations is unnecessary. In support of this observation, 
an additional fit to the data taken in 1954 was made to the same 
data shown in figure 6, except that only 22 observations picked 
randomly from the original 87 observations, taken in the 1-degree 
square under consideration, were used. The results follow: 


Area C (1954) Data Set 1 Data Set 2 
Number of observations 286 221 
100R2 percent variance explained 75.0 74.9 
R, multiple correlation coefficient 0.87 0.87 
o, standard deviation in degrees F 1.8 Ie tl 


The almost identical results of the two regression analyses sug- 
gest that the additional 65 observations used in the first analysis 
did not contribute any additional information, and that the abnormal 
spatial distribution of data did not distort the statistical analysis. 
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The implication is important. A sampling interval based on a 
model representing the distribution of the variable should be used 
in place of the fixed time interval employed in the classical 
manner with an area grid. The oceanographic problem becomes 
one of searching for adequate models. It is believed that adequate 
models can be developed for many oceanic areas through examina- 
tion of historical data and present knowledge of oceanic dynamics. 


Data Screening 


It may be possible to use regression techniques to identify 
and eliminate erroneous data from data samples. Saur!? discusses 
this problem. Erroneous measurements are particularly trouble- 
some when the conventional space/time methods of data summar- 
ization are used, since the number of observations in any given 
cell is generally small, and erroneous data can often distort 
arithmetic means and standard deviations of discrete samples. 
Since the average for any given cell is independent of the average 
in any other space/time cell, the contribution of an erroneous 
observation tends to be maximized. Frequently a biased average 
results that must be compensated for in some manner, generally 
subjective, in the subsequent contour analysis. Thus, the problem 
of editing out erroneous observations is of considerable importance. 
Provided the regression model is reasonably valid, the regression 
technique may offer an effective, and objective, method for editing 


out the erroneous observations. Since space and time are treated 
simultaneously in the regression model rather than separately as 
in the conventional space/time averaging approach, a single 
observation is not overly weighted in the averaging process. 

Data taken in the 30 N-latitude strip for fiscal year 1950 will 
be used to illustrate this editing technique. The original set of 
raw data contained 199 observations taken in the 18-month period 
centered on fiscal year 1950. The left-hand section of figure 18 
shows the statistical results obtained by fitting equation (4) to this 
complete data set. In addition, a histogram of the differences 
between the sea-surface temperature obtained from the regression 
equation and the observed value is presented. It is noted that 
there are three differences greater than +3 standard deviations and 
eight differences greater than +2 standard deviations. The original 
data for these 11 observations were examined and in all cases real 
errors were found. The correspondence suggests that gross 
errors in data sets may be detected by means of a regression 
model and eliminated by rejecting data whose differences are 
greater than some multiple of the standard deviation. The center 
and right-hand sections presentresults obtained by rejecting data 
whose differences were greater than +3 and +2 standard devia- 
tions, respectively. If, for one reason or another, it is not 
desirable to eliminate the erroneous data, the regression technique 
affords a method of rapidly identifying the data badly in error. 
Once identified, data can be examined, corrected, and salvaged 
for subsequent analysis. 

Several such analyses were made on different data sets and 
in all cases the data identified by large differences were found to 
contain real errors. 
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Figure 18. An example of the use of a regression equation to edit raw sea-surface temperature data, 


Anomaly Detection 


Regression models could be used as anomaly detectors. In 
this application a model, such as equation (4), could be used to 
remove the systematic variations in latitude, longitude, and day- 
of-year. Through a study of the differences between observation 
and regression (anomalies), information on nonsystematic and 
other systematic space/time changes would be obtained. The 
anomalous variations could be examined in terms of causes and 
mechanisms. This application of regression models was alluded 
to in the discussion of figures 12 to 14, in which it was noted that 
the differences revealed short-period, small-area, nonsystematic 
anomalies. 

An additional example of this use of regression models may 
be found in the differences associated with the data used in figure 
9. It is well known that upwelling of cold water occurs off the 
coast of California from about 30° N to 45 N from March to July. 
The phenomenon is associated with the north-northwest winds that 
prevail off the coast of California during these months. The 
upwelling results in summer and autumn surface temperatures 
considerably lower than those expected on a seasonal basis alone. 
The colder-than-expected surface temperatures are centered in 
the vicinity of 35 _N and 40°N, Figure 19 shows the differences 
for July between the observed surface temperatures and surface 
temperatures computed from regression. A negative sign means 
the observed temperature was lower than estimated. The anom- 
alous effect of upwelling on surface temperature is obvious. 
Parenthetically it is noted that since it is known that these anom- 
alies are the result of a north-northwest wind pattern, they could, 
in principle, be removed by introducing the wind vector as an 
independent variable in the regression equation. 
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Figure 19. July differences between observed and computed sea-surface temperatures 
illustrating the use of a regression model for anomaly detection. 
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Summarization of Historical Data 


Regression techniques could also supply a new approach to 
summarizing historical sea-surface temperature data. It is 
assumed that it is possible to develop a realistic regression model. 
In this study the model was derived by a pseudo-objective method 
involving a trial-and-error approach to determining the inter- 
action terms. Before regression modeling can become completely 
satisfactory as a method of sea-surface temperature summariza- 
tion, it will be necessary to develop objective methods of deter- 
mining the main effects and their interactions. If it is assumed 
that a physically acceptable regression model can be developed, 
it might still be asked how such a model can yield estimates of the 
day-to-day and location-to-location sea-surface temperature. An 
unpublished NEL study suggests that an 8-to-10-year time-series 
record of sea-surface temperatures is long enough to produce 
reliable long-term estimates that are independent of the time 
period of observation. Thus, if there is available a 10-year 
record of sea-surface temperatures covering the area for which 
the regression model was developed, the regression equation can 
be fitted to each year of data to provide 10 yearly sets of regres- 
sion coefficients. A sample could then be drawn from each yearly 
distribution and combined into a composite sample which could be 
considered a sample drawn from the 10-year time period. The 
regression equation could then be fitted to this composite sample 
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to produce a regression equation that would represent the temporal 
and spatial variation of sea-surface temperature independently of 
year-to-year effects. This would be analogous to the climatic 
charts of the meteorologist. 

In the absence of any other information this regression 
equation will give the best estimate of sea-surface temperature 
and its variance for any latitude, longitude, and day-of-year. 
Year-to-year variations, which of course do exist, are neglected. 
To improve on this estimate it is necessary to consider the year- 
to-year variation. This might be done as follows: Assume that 
some observations have been made during the past several months 
over the area. The composite surface could be adjusted to the 
new data by a least-squares adjustment of the origin of the regres- 
sion equation to pass the surface through the currently observed 
data. The adjusted surface will then be the best estimate of sea- 
surface temperature for any future day. 

For any particular day of the year contour charts, such as 
illustrated by figure 20, could if desired be prepared. This par- 
ticular chart was prepared, using equation (4) fitted to the data 
taken in fiscal year 1950 in the large area, for 8 November 1950. 
A comparison of this chart with a chart prepared using more 
classical techniques shows excellent agreement. 
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Figure 20. Sea-surface temperature contours for 8 November 1949 computed from a 
regression model. 
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CONCLUSIONS 


1. A regression model considering latitude, longitude, and 
day-of-year as the independent variables together with empirically 
determined interaction terms, was found capable of estimating 
the seasonal variation of sea-surface temperatures off the west 
coast of the United States, in water depths greater than 100 
fathoms, to one standard deviation of something less than 1 F. 

2. From both a statistical and a physical viewpoint rela- 
tively simple regression models have a considerable potential as 
estimators of seasonal and spatial variation in sea-surface tem- 
perature. 

3. The analysis suggests that more information than pre- 
viously suspected can be obtained from a given number of observa- 
tions provided realistic regression models can be developed. 

This has important implications with regard to sampling. A 
sampling interval based on the model can be used in place of the 
fixed time interval employed in the classical manner with an area 
grid. The oceanographic problem becomes one of searching for 
adequate models. It is indicated such models can be derived for 
many ocean areas from the present archive of oceanic tempera- 
ture data. 

4, On the assumption that the regression model is reason- 
ably valid, the regression technique has the potential of being 
an effective, and objective, method for identifying and editing 
raw temperature data for erroneous observations. 

5. When used to remove the seasonal and spatial variation 
in a set of sea-surface temperature data, a regression model, 
such as discussed in this study, may be used to detect and isolate 
temperature anomalies. 

6. Finally, this study suggests regression techniques may 
be used as a new approach to summarizing archived sea-surface 
temperature data that is more objective and amenable to computer 
usage than presently used methods. 
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RECOMMENDATIONS 


As an outgrowth of this study the following investigations are 
indicated: 

1. Determination of how large an area can be covered by 
one regression surface. 

2. Determination of optimum sampling procedures and 
sample sizes. 

3. Extension of regression models to include depth as an 
independent variable. 

4. Application of regression modeling techniques to describ- 
ing the distribution of other oceanic parameters, such as salinity. 
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